Automating psycholinguistic statistics computation: Procura-PALavras

نویسندگان

  • João F. Machado
  • José João Almeida
  • Alberto Simões
  • Ana Soares
چکیده

This article describes psycholinguistic lexical databases available in various languages, including English, Spanish and Portuguese. These lexical databases are important for researchers in Psycholinguistics and other related areas, providing a pool of experimental materials and allowing for an efficient process of selection of these experimental materials. The process of gathering statistics is slow, resulting in a small pool of materials in the short-term. The need to find an alternative method to gather limited or yet unavailable statistics for a specific language led us to consider gathering statistics from other languages and to compute their triangulation. Our aim was to automatize the computation of statistics such as Familiarity, Imageability, Age of Acquisition and Written Word Frequency for that specific language. We will describe the process of preparing this data and triangulating and comparing statistics for some languages in an attempt of finding a relationship between them. The results were analysed considering correlations between each statistic in each pair of languages and by computing the mean of absolute differences between each language’s values.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

P-PAL: Uma base lexical com índices psicolinguísticos do Português Europeu

Neste trabalho apresentamos o projecto Procura-PALvras (P-PAL) cujo principal objectivo é desenvolver uma ferramenta electrónica que disponibilize informação sobre ı́ndices psicolingúısticos objectivos e subjectivos de palavras do Português Europeu (PE). O P-PAL será disponibilizado gratuitamente à comunidade cient́ıfica num formato amigável a partir de um śıtio na Internet a construir para o efe...

متن کامل

Reconhecimento de Palavras Manuscritas usando Modelos de Markov

This paper presents a handwriting recognition system that deals with unconstrained handwriting and large vocabularies. The system is based on a segmentation–recognition paradigm where words are first loosely segmented into characters and the final segmentation is obtained during the recognition process, which is driven by a lexicon. Characters are modeled by multiple hidden Markov models (HMMs)...

متن کامل

On the advantages of word frequency and contextual diversity measures extracted from subtitles: The case of Portuguese.

We examined the potential advantage of the lexical databases using subtitles and present SUBTLEX-PT, a new lexical database for 132,710 Portuguese words obtained from a 78 million corpus based on film and television series subtitles, offering word frequency and contextual diversity measures. Additionally we validated SUBTLEX-PT with a lexical decision study involving 1920 Portuguese words (and ...

متن کامل

Industrial coagglomeration: some state-level evidence for Brazil

Resumo O artigo quantifica a coaglomeração industrial entre pares de setores da indústria de transformação no Estado do Rio de Janeiro, em 2010. Para tanto, considera-se o índice de coaglomeração avançado por Ellison et al. (2010) e procura-se relacionar com indicadores que aproximariam o uso de trabalhadores semelhantes (labor pooling), proximidade com fornecedores e clientes, e vantagens natu...

متن کامل

Learning Summary Statistic for Approximate Bayesian Computation via Deep Neural Network

Approximate Bayesian Computation (ABC) methods are used to approximate posterior distributions in models with unknown or computationally intractable likelihoods. Both the accuracy and computational efficiency of ABC depend on the choice of summary statistic, but outside of special cases where the optimal summary statistics are known, it is unclear which guiding principles can be used to constru...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010